Semandaq: a data quality system based on conditional functional dependencies
نویسندگان
چکیده
We present SEMANDAQ, a prototype system for improving the quality of relational data. Based on the recently proposed conditional functional dependencies (CFDs), it detects and repairs errors and inconsistencies that emerge as violations of these constraints. We demonstrate the following functionalities supported by SEMANDAQ: (a) an interface for specifying CFDs; (b) a visual tool for automated detection of CFD violations in relational data, leveraging efficient SQL-based techniques; (c) extensive visual data exploration capabilities that provide the user with various measures of the quality of the data; (d) repair (cleaning) functionality without excess human interaction, built upon CFD-based cleaning algorithms; we show how SEMANDAQ allows for a natural exploration of the quality of the obtained repairs. SEMANDAQ is a promising tool that provides easy access and user-friendly data quality facilities for any relational database system.
منابع مشابه
Approximation Measures for Conditional Functional Dependencies Using Stripped Conditional Partitions
Received Apr 11, 2017 Revised May 5, 2017 Accepted May 24, 2017 Conditional functional dependencies (CFDs) have been used to improve the quality of data, including detecting and repairing data inconsistencies. Approximation measures have significant importance for data dependencies in data mining. To adapt to exceptions in real data, the measures are used to relax the strictness of CFDs for mor...
متن کاملDiscover Dependencies from Data - A Review
Functional and inclusion dependency discovery is important to knowledge discovery, database semantics analysis, database design, and data quality assessment. Motivated by the importance of dependency discovery, this paper reviews the methods for functional dependency, conditional functional dependency, approximate functional dependency and inclusion dependency discovery in relational databases ...
متن کاملAnalyses and Validation of Conditional Dependencies with Built-in Predicates
This paper proposes a natural extension of conditional functional dependencies (cfds [14]) and conditional inclusion dependencies (cinds [8]), denoted by cfds and cinds, respectively, by specifying patterns of data values with 6=, <,≤, > and ≥ predicates. As data quality rules, cfds and cinds are able to capture errors that commonly arise in practice but cannot be detected by cfds and cinds. We...
متن کاملDiscovering Conditional Functional Dependencies to Detect Data Inconsistencies
Poor quality data is a growing and costly problem that affects many enterprises across all aspects of their business ranging from operational efficiency to revenue protection. In this paper, we present an approach that efficiently and robustly discovers conditional functional dependencies for detecting inconsistencies in data and hence improves data quality. We evaluate our approach empirically...
متن کاملMining Constant Conditional Functional Dependencies for Improving Data Quality
This paper applies the data mining techniques in the area of data cleaning as effective in discovering Constant Conditional Functional Dependencies(CCFDs) from relational databases . These CCFDs are used as business rules for context dependent data validations. Conditional Functional Dependencies(CFDs) are an extension of Functional dependencies(FDs) which captures the consistency of data by su...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- PVLDB
دوره 1 شماره
صفحات -
تاریخ انتشار 2008